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We present an efficient method for the calculation of free energy landscapes. Our approach involves 
a history dependent bias potential which is evaluated on a grid. The corresponding free energy 
landscape is constructed via a histogram reweighting procedure a posteriori. Due to the presence 
of the bias potential, it can be also used to accelerate rare events. In addition, the calculated free 
energy landscape is not restricted to the actual choice of collective variables and can in principle 
be extended to auxiliary variables of interest without further numerical effort. The applicability 
is shown for several examples. We present numerical results for the alanine dipeptide and the 
Met-Enkephalin in explicit solution to illustrate our approach. Furthermore we derive an empirical 
formula that allows the prediction of the computational cost for the ordinary metadynamics variant 
in comparison to our approach which is validated by a dimensionless representation. 

Keywords: Free energy landscape, Essential dynamics, Metadynamics variants. Histogram reweighting 



INTRODUCTION 



The characteristics and the behaviour of physical sys- 
tems can be understood and predicted by the investiga- 
tion of the underlying free energy landscape. The knowl- 
edge of this often complex surface allows one to deter- 
mine reaction pathways for chemical reactions as well 
as stable configurations for proteins [1-5, 9] and glass- 
forming systems [6-8]. A successful tool for these studies 
are computer simulations which help to explore the char- 
acteristics of the system in detail. Naturally, during the 
last years a lot of effort has been spent to develop novel 
efficient and time-saving methods. 

Specifically in protein sciences, the free energy approach 
is promising for the investigation of unknown folding 
mechanisms. To achieve a proper description of the 
dynamical behaviour, the trajectories may be projected 
onto a set of well chosen collective variables [H, Q which 
allow to define an effective energy landscape. These vari- 
ables can be interpreted as effective reaction coordinates 
spanning the phase space respectively the free energy 
space of the system. Well known collective variables are 
the number of native contacts, dihedral angles and the 
essential eigenvectors or principal components of the pro- 
tein [H, m . Thus a set of well suited reaction coordinates 
offers the opportunity to describe the folding or unfolding 
of a protein adequately without too much loss of infor- 
mation. 

However, it has to be mentioned that in some cases suit- 
able collective variables are hard to determine. The in- 
vestigation of the underlying dynamics of the system can 
then be performed by computational methods like dis- 
crete path sampling [H, and transition path sampling 
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12\ to overcome this problem. These techniques rep- 
resent computational algorithms which are not depen- 
dent on the usage of collective variables. 
Experiments and computer simulations have shown that 



naturally occuring proteins often have free energy land- 
scapes with global funneling properties [H-lsl, [H, [H, [sf. 
Nevertheless local minima can occur which represent 
trapped conformations of the protein [l|, ^ . The transi- 
tions between these minima which are called rare events 
occur on timescales which are typically not accessible in 
computer simulations. Over the last years several nu- 
merical methods have been developed to accelarate rare 
events and to compute the underlying free energy land- 
scapes. In the following we will mention the most com- 
mon ones which have been well established in the scien- 
tific community. 

Often used methods are thermodynamic integration [isl - 
15] and free energy perturbation [l6|. These techniques 
are well suited to calculate static properties like hydra- 
tion energies but normally fail in the calculation of free 
energy differences between specific folding mechanisms. 
To overcome this situation and to gain insight into dy- 
namical quantities, more sophisticated ideas have to be 
adopted. 

The umbrella sampling method [17] allows to determine 
free energy differences via the corresponding probability 
distributions of the whole accessible phase space. An ef- 
fective additional biasing potential drives the system to 
more unlikely regions such that the whole phase space is 
visited. This idea has also been used in adaptive force 
biased calculations 18, 19|, steered as well as adiabatic 



Molecular Dynamics simulatio ns |2Q[ l2l| and multicanon- 



ical sampling approaches |22l-l24j. Another promising 



method for the investigation of folding properties is the 
replica-exchange algorithm [25]. 

Specifically in the last years several techniques have been 
proposed which employ an adaptively varying bias po- 
tential as an estimate for the free energy. Examples 
for these methods are the Metadynamics algorithm in 
its several variants 26, uTl and the related local eleva- 



tion method respectively the conformational flooding ap- 
proach g8|,i29|]. 
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The main idea of the Metadynamics method relies on 
a successive flattening of the free energy landscape by 
an additional potential energy in form of small gaussian 
hills which are positioned at relaxed times at the present 
location of the system. Then the free energy can be esti- 
mated as a negative mirror of the potential energy which 
is exact in the limit of long simulation times [27| . It has 
been shown that this method is in principle valid for all 
collective variables and offers a broad range of applica- 
bility [27|. Even the calculation of microscopic averages 
within the metadynamics scheme has been successfully 
examined [30| . Nevertheless the drastic error dependence 
on the parameters of the algorithm [31, 'si] in the origi- 
nal metadynamics scheme has to be mentioned. Keeping 
the errors within tolerable ranges results in a drastic in- 
crease of simulation time. To avoid local bumps in the 
landscape and to keep the errors low, extensions of the 
existing Metadynamics method 33|-36| have been pro- 
posed which offer a more reliable derivation. 
In this paper we present a Histogram reweighted Meta- 
dynamics technique which offers the opportunity to over- 
come some drawbacks of the original method and its vari- 
ants. Our method is grid-based which means that the 
energy landscape spanned by the collective variables is 
divided into several regions. By adding a short range cut- 
off biasing potential which is only evaluated at the grid 
points of a predefined grid, the corresponding simula- 
tion time can be massively decreased. Finally, the corre- 
sponding biased probability distributions are reweighted 
by histogram techniques j37H4Q] to compute the free en- 
ergy landscape a posteriori. It has been noticed that 
these techniques offer a low error dependent description 

Additionally a projection scheme is introduced which al- 
lows the investigation of further collective variables if the 
probability distributions of the studied variables are well 
overlapping and if the projected coordinates are well- 
suited and fastly varying. Although it is clearly impossi- 
ble to reconstruct the whole highly dimensional phase 
space of the protein, our projection scheme allows to 
investigate additional low dimensional reaction coordi- 
nates like dihedral angles or spatial distributions in good 
agreement to directly derived energy landscapes. As a 
consequence of this scheme, no further simulation time 
is required as the analysis can be performed a posteriori 
and the landscape is not restricted to the actual chosen 
set of coordinates. 

This becomes useful by regarding the fact that difficulties 
for predefined collective variables may arise, if they are 
not the true reaction coordinates of the system [23, |4l| 
or give unsuitable descriptions of the corresponding free 
energy landscapes. This is in particular a problem if 
the underlying folding or unfolding mechanisms are too 
complex for being projected onto a low dimensional sub- 
space which may lead to wrong conclusions [27] or if the 
phase space cannot be sampled efficiently. In these cases. 



discrete path sampling [sl, [lo| , transition path sampling 

ii|,|i3 

and additional methods (Zll-li^ allow to investi- 
gate the hidden complexity of the free energy landscape 
without the usage of collective variables as alternative 
approaches. 

By comparison to other approaches, it has to be no- 
ticed that in recent publications [sH, |45|, \^ the usage 
of umbrella sampled biased probability distributions re- 
spectively using histogram reweighting procedures [48| 
for the correction of free energy landscapes has also been 
claimed. Even the evaluation of the underlying potential 
on a grid point has been recently proposed for metady- 
namics [46] as well as for a closely related adaptive bias 
molecular dynamics scheme [47]. Both ingredients have 
also been applied in a grid-based adaptive umbrella sam- 
pling scheme [49|] over a decade ago and the above men- 
tioned techniques are implemented in common software 
packages 5Q|-55|. 

A main reason for the introduction of grid evaluations 
is a massive decrease in computation time which scales 
by a constant factor. This fact is also used in [46[ where 
smoothly truncated Gaussians are evaluated by a kd-tree. 
Further realizations of grid techniques incorporate adap- 
tive grids where the potential energy is calculate d by 
a polynomial extrapolation on the grid points [5l|, l53|, 
choosing the potential of a close grid point as the bias- 
ing potential on the particle (5l| or applying cut-off radii 
for the gaussians [5l!. It was also shown that the above 
mentioned adaptive biased grid approach in its computa- 
tional cost scales linearly with simulation time in contrast 
to ordinary metadynamics [i^ • Here cubic B-splines are 
used for the evaluation of the biasing potential on the 
corresponding grid points. 

Our approach has the advantage of a well-defined sim- 
ple potential all over the energy surface. This allows to 
change the resolution of the grid on-the-fiy even after the 
simulations have finished to resolve regions of specific in- 
terest in more detail. In addition, it is easy to implement 
and scales with a constant number 0{2d) of calculations 
per timestep, where d is the number of collective vari- 
ables, respectively the dimension of the grid. 
As a further remark, our potential energy landscape 
is purely used as a biasing potential to accelerate rare 
events. Hence, the fine resolution of the free energy land- 
scape is achieved by histogram techniques. This allows 
us to use a very rough potential energy surface created in 
a short simulation time. Additionally the bias force is al- 
ways well-defined such that discontinouities are avoided. 
Therefore our method in its interpretation is closely re- 
lated to the Wang-Landau approach [56] and adaptive 
umbrella biasing techniques [49] with the effectiveness of 
(47} . This finally results in a simple and robust algorithm 
which is easy to implement in common public software 
packages like GROMACS or other programs and 

allows to tune the resolution of the landscape on demand 
easily after the simulations have finished. 
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We further derived an empirical formula for the com- 
putational cost required for the ordinary metadynamics 
scheme in comparison to our approach. Thus we were 
able to show that the grid technique for reasons of com- 
putational efficiency is often more preferable especially 
for a large number of hills in comparison to the number 
of atoms in the system. 

The paper is organized as follows. In the second section 
we introduce the method and the theoretical background. 
The third and the fourth section illustrate model test 
cases and the numerical details of the following peptide 
simulations. The fifth section presents the results for the 
alanine dipeptide and the Met-Enkephalin. In the sixth 
section the values for the computational cost required for 
both methods are investigated and an empirical formula 
is derived. We conclude with a brief summary in the last 
section. 



HISTOGRAM REWEIGHTED METADYNAMICS: 
THEORETICAL BACKGROUND 

The system we consider is described by a set of coordi- 
nates X evolving under the action of dynamics following 
the trajectory x{t) and described by a canonical equi- 
librium distribution at temperature T. The set of coor- 
dinates X may include atomic positions or angles as well 
as any other auxiliary collective variable representing the 
characteristics of the system. 

If the system shows metastability, some regions separated 
by large energy barriers cannot be explored by the evo- 
lution of the trajectories in a reasonable simulation time. 
Guided by the conventional metadynamics approach an 
additional potential has to be added at specific constant 
times ^1,^2, •••^AT on the trajectory x{t) to overcome the 
barriers and to accelerate the rare events. It has to 
be ensured that the protein relaxes between these times 
such that the system diffuses in the next local minimum. 
For the force evaluation in the ordinary Metadynamics 
method, the sum over all previously added gaussian func- 
tions has to be performed [26] which results in a strong 
increase of computation time. To avoid this increase, we 
use an approach where the additional potential energy is 
evaluated on the grid points of a predefined grid closely 
related to (46|, |47|, , spanning the whole range of the 
accessible phase space in the set of collective variables x. 
The grid points Xq ^ are separated by the grid constant 
cr^, which is the distance between two neighboring grid 
points in one dimension. The system now evolves in time 
moving over the energy landscape covered by the grid. At 
the times t = ti, ^2, ...t^ the following potential energy 



defined by 



AVrix^^^.t) = g{x-x^Juj 
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is evaluated on all grid points N^^^ whose distance from 
x{t) is closer than the cut-off radius Xc. The magnitude 
of the potential energy is given by the height uj like in 
the conventional metadynamics scheme. As it has been 
discussed in [27] the values for uj have to be low. 
The potential energy at the grid points then evolves in 
simulation time with 

Vt , t„ ) = Ft , t;v- J + AFt , t„ ) (4) 

emerging rapidly if the specific neighborhood of the grid 
point is often visited by the trajectory, e.g. in free energy 
minima. Then we need to update 2d values of grid points 
for each calculation, where d is the number of collective 
variables or in other words the dimension of the grid. 
The biasing potential exerted on the system at times t > 
yields 



VB{x,t) 



Ngk 



(5) 



where the summation is over the number of grid points 
Ngk within Xc, with the actual value of the poten- 
tial Vt{Xq i7^Ar) of each grid point at X Q ^ as defined in 
Eqn. dl]) and the weighting factor Q{x — x^^) of Eqn. ([2]). 
The resulting local minimum between two grid points can 
usually be neglected if the distance between neighboring 
grid points cfq is small compared to the typical expected 
diameter of a free energy minimum, respectively a high 
resolution of the grid. Repeating the whole scheme al- 
lows to fill the minima efficiently until the potential en- 
ergy is reached to overcome the energetic barriers. The 
energy landscape with constant values Vt{x^ .) and the 
corresponding continuous function Vb {x) are finally used 
in the biased simulation runs to produce fiat probability 
distributions. An illustration of the scheme is presented 
in Fig. [H 

It has to be noticed that th e sp ecific choice of the biasing 



potential is arbitrary [27|, |46| although in combination 



with the grid potential evaluation it can be pointed out 
that the function proposed in Eqn. ([2]) offers the advan- 
tage of a fast computation. Due to the well-defined po- 
tential at every position, it can be even used to achieve a 
finer resolution of the grid in combination with histogram 
reweighting techniques. As a further remark, it has to be 
mentioned that the final biased simulations as well as the 
previous simulation runs for the construction of the bias- 
ing potential can be trivially parallelized in the spirit of 
the multiple walker metadynamics technique [61] to save 
computation time. 

In the following we give a short description of the his- 
togram reweighting technique. The free energy landscape 
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FIG. 1: Illustration of the Histogram Reweighted Metady- 
namics scheme with Xc = cr^ = la. Panel A shows the con- 
stant potential energy Vt{x^ .,t) of the grid points at times 
T <t < n. The biasing potential energy Vsix^t) of Eqn. ([5]) 
for times r < t < n is shown in Panel B. At times t — ti 
(Panel C) a new gaussian function AVT(x,t) is placed at the 
actual position of the system at x = —1.25a which is only 
evaluated at the grid points such that AVt{x^ .) is summed 
to the old potential Vt{x^ . , r) via Eqn. Q yielding the new 
values Vt{xq . , n) (shown as circles). The resulting new bias- 
ing potential function Vsix^t) is shown in Panel D. 



is finally calculated via the WHAM equations |37l-l4Q| 
by a reweighting of the biased probability distributions. 
This procedure allows the calculation of the free energy 
within a finite tolerance value |40| and its combination 
to the biasing potential for refinement and construction 
has been shown to be advantageous [^,[3ll, 0,(53. The 
scheme is given by two equations 



P{x) = 



E 



and 



F, = 



-ksT log i P{x)e- 

\xbin3 



VB{x)/kBT 



(6) 



(7) 



which allow to compute the best estimate of the unbi- 
ased probability P{x) where Ngim denotes the number 
of independent simulations, ni{x) the number of counts 
in a histogram bin associated with x, Vsix) is the final 
constant biasing potential of Eqn. (|5|) at position x and 
the free energy shift is Fi for each simulation with ther- 
mal energy ksT. Due to the fact that Fi and P{x) are 
unknowns, Eqn. ([6]) and Eqn. ([7]) have to be solved by 
iteration to self-consistency within a predefined tolerance 
value. The values of P{x) can then be used to calculate 
the resulting free energy of the bin via 



F{x) 



(8) 



where P(0) is a reference value and R denotes the molar 
gas constant [s^. It has to be pointed out, that the er- 
ror in the energy is strongly dependent on the resolution 
of the histogram leading to more or less pronounced ap- 
proximations. For a detailed description of the method 
we refer the reader to Refs. 
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As it was mentioned in the introduction, the main part of 
a successful calculation relies on the choice of appropri- 
ate collective variables on which the free energy surface 
is spanned. Well-suited collective variables are dihedral 
angles as well as the essential eigenvectors of the sys- 
tem [Hil. It has been recently demonstrated that the 
application of eigenvectors as collective variables results 
in adequate descriptions of the folding mechanisms and 
of the free energy landscape in ordinary Metadynamics 
computations [gsI, Q • Thus we follow this approach due 
to the assumption that nearly all relevant motion is cap- 
tured in the first eigenvectors of the system [62]. This is 
remarkable by regarding the fact that drastic difficulties 
appear if wrong reaction coordinates are chosen which do 
not capture the main motion [27|, |4l|. But nevertheless, 
it has again to be remarked that the usage of eigenvectors 
is not a guarantee for correctness of the free energy land- 
scape due to hidden complexities in higher dimensional 
collective variables [27]. In the following we give a brief 
description of the method. 

The essential eigenvectors can be calculated by the super- 
imposed coordinates f of N atoms of the system which 
build the covariance matrix C via 



C =< (r- < r >)(r- < r >)^ > 



(9) 



where z,j = 1,2,... 3 A/" and the time- averaged or ref- 
erence value is denoted by the brackets < ... >. The 
diagonalization of C leads to 



C = EAE 



(10) 



where E is a matrix of eigenvectors and A is a matrix of 
eigenvalues marking the positional fluctuations. Sorting 
the eigenvalues in decreasing order allows to identify the 
largest positional fluctuations with all important struc- 
tural transitions by the first eigenvectors which form the 
essential dynamics of the system [62[. The projection at 
time t on the i—th eigenvector is then defined by 



Pi{t) = {r{t)- <r>)-ei 



(11) 



with the specific eigenvectors ranging from i = 
1,2... 37V. 

If the motion of the system is not restricted to the es- 
sential subspace in the biased simulations such that even 
lower eigenvectors contribute, the potential biasing en- 
ergy of Eqn. dSJ even activates the motion of the remain- 
ing subspace. Hence the concerted motion of all eigenvec- 
tors is influenced by the biasing energy. Therefore a pro- 
jection scheme allows to construct the free energy land- 
scapes of additional collective variables like lower eigen- 
vectors or dihedral angles a posteriori. 
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The main idea is that the apphed bias potential energy 
drives the system through structural transitions which 
could be in principle also observed in additional collec- 
tive variables. The occurrence of the variables x{t) have 
to be projected onto the new collective variable ^(t) ex- 
posed at the same time. Formally, this is given by 

/ : x{t) ^ at), (12) 

together with the corresponding biasing potential energy 

/ : VB{x,t)^VB{at). (13) 

After the final timestep r, the constant potential energy 
at the new grid points . can be evaluated by using 
the maximum last values of t). Choosing the max- 
imum value for VB{^Gi) within certain small regions (5, 
where S is the half distance between two grid points gives 



(14) 



for the construction of the potential energy landscape in 
the new set of chosen reaction coordinates. The values 
for VB{£,G,i) be inserted into the Eqns. (j6j) and 

([7]) with the corresponding biased probabilities P{£,G,i)- 
The final potential energy landscape in the new set of col- 
lective variables of Eqn. (p!4|) is evaluated by the existing 
data and by explicit calculation of each ^(t). Sampling 
the biased probabilities in the new set of variables finally 
allows to determine the unbiased probabilities by using 
the histogram reweighting procedure in the new set of 
collective variables at any resolution. 
Again it has to be ensured that the motion is not con- 
strained and that the potential energy even activates 
lower eigenvectors to establish a free system behaviour. 
In principle all collective variables which are fastly vary- 
ing and cover a small subspace can be seen as suitable re- 
action coordinates if a sufficient simulation time is given. 
Additionally they also have to clearly distinguish between 
different states of conformations, can be well sampled 
meaning a good overlap in the distribution functions and 
covering of the phase space and show no hysteresis effects 
due to hidden complexities [27]. Several publications dis- 
cuss this problem and new techniques to overcome it in 
more detail ji, fiol - flil . lill-li^ . If these requirements are 
fulfilled, the projected energy landscape into the new col- 
lective variables can be seen as reliable. Nevertheless, it 
has to be noticed that the validity of a landscape is re- 
lated to the suitability of the chosen projected collective 
variables. This is also true for the original chosen coor- 
dinates. 



1- AND 2-DIMENSIONAL MODEL POTENTIAL 

The first test case for our approach is a particle con- 
fined in a one-dimensional well defined potential 

F(.) = J(i)^-10(^)=l ,15) 



where e is the unit thermal energy ksT and the length 
unit is given by a. We performed a Monte Carlo sim- 
ulation with the Metropolis criterion [65| consisting of 
10^ timesteps where every 500— th step a potential en- 
ergy with height u = 0.25e has been set. The grid points 
were separated by a grid constant = 0.1a. The cut- 
off radius for the construction of potential function has 
been chosen to Xc = 0.2cr = 2a^ and Xc = cr^ for the 
evaluation of the biasing potential. Having constructed 
the biasing energy landscape, we performed four simula- 
tions with 10^ timesteps to derive the biased probability 
distribution functions which have been reweighted by the 
WHAM equations (Eqns. (|6j) and (O). Fig. [2] presents 
the numerical results for the potential of Eqn. (fT5]) . The 
numerical values are in good agreement to the theoretical 
results of the test potential except for \x\ > 3. Oct. The 
reason for this misbehaviour can be related to the low sta- 
tistical accuracy of the probability distribution [66| which 
is caused by the rapid increase of the model potential at 
these points. The constant potential Vb{x) of Eqn. (|5]) as 
well as the corresponding discrete potential energy values 
of the grid points of Eqn. (j4|), which have been used for 
the biased simulations are shown in the lower panel of 
Fig. [21 Nevertheless the results of this simple model po- 
tential have shown that our method works and produces 
accurate results in one dimension. 
Another 2-dimensional test potential is given by 



(16) 

which represents four energetic minima. We performed 
a couple of Langevin Dynamics simulations [67] obeying 
the Langevin equation which is given by 



(17) 



where the force Fi on a particle depends on the fric- 
tion coefficient the velocity of the particle Vi and the 
stochastic force ffi. The stochastic force represents the 
thermal brownian motion of the particle with the follow- 
ing properties 



and 



<Vi{t) >=0 (18) 



< V^a{t)VJ^{t') >= 2CkBTS,jSa^S{t - t') (19) 



which ensures the presence of the ffuctuation dissipation 
relation. Thus the stochastic force is delta-correlated 
white noise which ensures a canonical ensemble at ther- 
mal equilibrium. We chose = 1^/me/cF with the mass 
m, the temperature T = 1 and ran a simulation of 10^ 
timesteps, where every 1000-th step a gaussian function 
has been set. We chose as values for the parameters 
(j^ = 0.25(7, Xc = 0.5(7, uj = O.le and we used a timestep 
of dt = O.Ola f^/me. The final values for Vrixci) of 
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FIG. 2: Top: Shifted energy landscape for the test poten- 
tial of Eqn. {[15]). The circles correspond to the values de- 
rived by the Histogram reweighted Metadynamincs approach 
whereas the black line corresponds to Eqn. ()15|) . The data 
points are in good agreement to Eqn. ()15p except for values 
\x\ > 3.0(7. This is due to the error of the statistical accuracy 
of the probability distribution which is caused by the rapid 
increase of the test potential. Bottom: Continous biasing 
potential Vb{x) (solid line) which is used for the calculation 
of the biasing probabilities and the corresponding values of 
the potential energy Vt{xq .) (points) evaluated at the grid 
points. 




x[a] ° 



FIG. 3: Values of VT{xG,i) which have been used in the biased 
simulations for the 2-dimensional test potential of Eqn. (|16p . 




Eqn. dU are shown in Fig.[3l These values have been used 
for the evaluation of the biasing potential of Eqn. (|5]). We 
performed four simulation runs with 2.5 • 10^ timesteps 
where again every 1000-th step the position of the par- 
ticle has been stored for the sampling of the underlying 
free energy landscape. The resulting reweighted energy 
profile is presented in Fig. |4l The good agreement to the 
energy profile of Eqn. ([16]) is obvious. Larger deviations 
only occur at positions x^y > 3.5<j above 60 ksT due to 
poor statistical accuracy. The standard deviation to the 
correct profile is around 5% per value. 



FIG. 4: Top: Energy profile of Eqn. ([16]). Bottom: Result- 
ing reweighted energy profile. The solid lines correspond to 
energy differences of bksT. 



NUMERICAL DETAILS 

All our Molecular Dynamics simulations have been 
erformed by the software package GROMACS [58- 
16011 in which our in-house written code for the His- 



togram reweighted Metadynamics method has been im- 
plemented. The source code is available on request by 
contacting one of the authors. 
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FIG. 5: Schematic representation of the alanine dipeptide 
with dihedral angles $ and ^. The numbered carbon 
atoms have been used for the eigenvector analysis. 



Alanine dipeptide 

We performed our Molecular Dynamics simulations by 
using_the GROMACS ports of the AMBER94 force field 



The simulation box contains 313 TIP3P water 
molecules [70| within a dimension of 2.005 x 2.194 x 2.192 
nm. The time step was 1 fs. The temperature T = 
300 K was kept constant by a Nose-Hoover thermostat. 
Electrostatics have been calculated by the PME (Particle 
Mesh Ewald) method. All bonds have been constrained 
by the SHAKE algorithm [71]. 

After a short equilibration run of 350 ps we performed 
a 500 ps unbiased simulation to analyze the eigenvec- 
tors of the system constructed by the numbered Ca car- 
bon atoms of Fig. [5l The corresponding potential energy 
landscape has been constructed in the phase space of the 
first two eigenvectors within a 2 ns simulation run. The 
grid constant has been chosen to = 0.01 nm, the 
height of the potential function has been set to cj = 0.2 
kJ/mol and the relaxation time was r = 1 ps. The cut- 
off radius Xc was 0.02 nm for hill setting and 0.01 nm 
for the evaluation of the biasing forces. The final biased 
probability distributions have been derived by four sev- 
eral independent 2 ns simulations with different temper- 
atures 300, 310, 320 and 330 K whose additional kinetic 
energy allows to explore the phase space more efficiently. 
Nevertheless the WHAM equations allow to combine the 
results for the different temperatures and give the results 
for a specific temperature if not too large deviations exist 
[39]. 

For a comparison of the eigenvector free energy land- 
scape we further applied a conventional metadynamics 
scheme as described as in 



The simulation time 
was 2 ns where every picosecond a gaussian hill of width 
0.01 nm with height 0.2 kJ/mol has been set. Addition- 
ally we used the software plug-in PLUMED [50, 51] for a 
conventional metadynamics simulation to directly calcu- 



late the free energy landscape of the Ramachandran plot. 
All parameters are identical to the histogram reweighted 
metadynamics scheme. 



Met-Enkephalin 

The molecule consists of five residues TYR-GLY-GLY- 
PHE-MET and its molecular structure has been taken 
from the PDB entry IPLW 0. The force field was 
GROMOS96 [73]. The box size has been chosen to 
3.214 X 3.214 X 3.214 nm with 1058 SPG water molecules. 
The time step was 2 fs and the temperature was kept 
constant by a Nose- Hoover thermostat. As mentioned 
above, all bonds have been constrained by the SHAKE 
algorithm [71]. Electrostatics have been again calculated 
by the PME (Particle Mesh Ewald) method. 
After a warm up phase of 350 ps we perfomed a 400 
ps simulation run for the analysis of the corresponding 
eigenvectors of all atoms at temperature T = 400 K. 
We chose this high temperature to capture all necessary 
transitions in the essential first eigenvectors. The corre- 
sponding potential energy grid has been constructed in 
a 1 ns simulation run where the grid constant has been 
chosen to cr^ = 0.01 nm and the height of the potential 
function has been set to uu = 0.2 kJ/mol with relaxation 
times of r = 2 ps. The cut-off radius Xc has been cho- 
sen to 0.2 nm respectively Xc = 0.1 nm as mentioned 
above. The biased simulations consist of three indepen- 
dent runs of T = 300 K and T = 305 K with 4 ns and 
T = 310 K with 1 ns simulation time. The corresponding 
timestep was St = 2 fs. All energy landscapes have been 
reweighted for a temperature T = 300 K. 



NUMERICAL RESULTS 
Alanine Dipeptide 

A well suited model system to test our approach is the 
alanine dipeptide (Fig. [5]). Alanine dipeptide is one of the 
most studied model systems over the last years 
80]. The corresponding eigenvalues and the cumulative 
positional fluctuations are shown in Fig. [6l It is obvious 
that nearly 80% of the overall positional fluctuation can 
be described by the first two eigenvectors. Thus it can be 
assumed that all important structural transitions should 
be captured by using these variables as reaction coordi- 
nates [sil. 

The corresponding positions for the 300 K biased sim- 
ulations are presented in the top (orange triangles) of 
Fig. [7] in contrast to an unperturbed simulation run (blue 
stars) for the same parameter sets. The bottom figure il- 
lustrates the values for the dihedral angles $ and ^ in 
the Ramachandran plot. It is obvious that the biasing 
potential energy landscape drives the system into more 
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FIG. 6: Fluctuation and cumulative relative positional fluc- 
tuation (inset) of the eigenvectors. Nearly 80% of the overall 
motion of the alanine dipeptide can be described by the first 
two eigenvectors. 
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FIG. 7: Top: Positions in the eigenvector space in the 300 
K biased simulation (orange triangles) and the unperturbed 
simulation results which are illustrated as blue stars. Bot- 
tom: Corresponding values for the dihedral angles $ and ^ 
sampled from the same simulation runs. 



FIG. 8: Free energy landscape for the eigenvectors 1 and 
2 of the alanine dipeptide. The lines correspond to energy 
differences of 0.5 kcal/mol. 
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FIG. 9: Top: Free energy landscape derived by the pro- 
jection of the eigenvectors 1 and 2 on the phase space of 
the dihedral angles $ and ^. The marked regions denote 
stable configurations with the corresponding molecular repre- 
sentation. The lines correspond to energy differences of 0.5 
kcal/mol. Bottom: Corresponding free energy landscape de- 
rived by a direct metadynamics simulation in the phase space 
of the corresponding dihedral angles $ and ^. 



unlikely minima to accelerate the rare events of these 
structural transitions. 

The final free energy landscapes at T = 300 K are 
shown in Figs. [8] and [9l Fig. [8] presents the free energy 
landscape in the space of eigenvectors whereas Fig. [9] is 
the corresponding Ramachandran plot in the space of the 
dihedral angles which has been derived by the proposed 
projection scheme and a direct conventional metadynam- 
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FIG. 11: Free energy landscape for the eigenvectors 1 and 
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0.05 



-0.05 -0.1 

EV 1 [nm] 



-0.15 -0.2 



tance. Previous studies have shown that the main shape 



FIG. 10: Free energy landscape for the first two eigenvectors 
in the conventional metadynamics scheme (top) and in the 
histogram reweighted metadynamics scheme (bottom). The 
lines denote energy differences of 0.5 kcal/mol. 



ics simulation run in the phase space of the dihedral an- 
gles. Both figures illustrate the corresponding conforma- 
tions of the alanine dipeptide and their relative energy 
difference from the most stable configuration aR to two 
further minima C7eq and C5. The relative energy differ- 
ence betweeen the aR and the C7eq and C5 minima is 
given by AF ^ 1.9 kcal/mol, respectively 2.2 kcal/mol 
which is in good agreement to the results reported in 
Ref. [si, Q with ^ 1.74 kcal/mol, respectively ^ 2.10 
kcal/mol. The good agreement in the location and the 
acceptable values of the free energy minima in compari- 
son to a direct biasing between the plots shown in Fig. [9] 
are remarkable which validates the proposed projection 
scheme. Finally we compare the results of the histogram 
reweighted metadynamics simulation in the phase space 
of the eigenvectors to a conventional metadynamics sim- 
ulation with 2000 hills. As the landscapes are nearly 
identical, it can be concluded that our method is valid. 
Only slight deviations can be observed for the specific 
shape of the minina due to the statistical error inherent 
in the original metadynamics variant [l^]. 



Met-Enkephalin 

The Met-Enkephalin has attracted broad interest in its 
stable conformations (sil-li^ due to its biological impor- 



of the free energy landscape is given by a funnel j83l-l85| 
with a couple of stable conformations which are sepa- 
rated by low energy barriers. Regarding the biological 
function of the Met-Enkephalin, which is an opioid pep- 
tide that inhibits neurotransmitter release from the ap- 
propriate opioid receptor, several receptors have to bind 
on the molecule which all require different stable confor- 
mations 86l, [STj. 

The corresponding eigenvector analysis of the Met- 
Enkephalin illustrates that over 73% of the atomic po- 
sitional fluctuation have been captured by the first two 
eigenvectors. Thus nearly all relevant structural transi- 
tions can be described by eigenvectors 1 and 2. 
The corresponding free energy landscape at T = 300 K 
is shown in Fig. [TTJ A couple of stable minima and con- 
figurations can be found in the funnel-like landscape in 
agreement to other publications (ssj-lsHj . The lines corre- 
spond to energy differences of 0.5 kcal/mol. It is obvious 
that all energetic barriers are lower than 1.5 kcal/mol. 
Hence the transitions between the stable configurations 
are not drastically hindered. Regarding the conforma- 
tions shown in Fig. (TTJ differences in their form and shape 
can be observed reflecting the biological function of the 
Met-Enkephalin [87|. The energetic flexibility in the il- 
lustrated conformations finally explains the lack of a poor 
experimental convergence to a single structure (ssj . 
Another interesting quantity is the solvent accessible sur- 
face area which allows the investigation of the importance 
of the solvent interactions for the Met-Enkephalin. The 
influence of the hydrophilic solvent accessible surface area 
Af , which represents the influence of hydration effects 
on the stability of the peptide can be investigated by the 
ratio to the total solvent accessible surface area Al which 
is illustrated in Fig. [121 
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FIG. 12: Free energy landscape for the ratio of the hy- 
drophihc solvent accessible surface area Af to the total sol- 
vent accessible surface area A* 



A large minimum can be found at a ratio of Af /Al ^ 
0.49. It can be shown that nearly all stable configura- 
tions presented in Fig. [11] obey this characteristic value 
for the ratio. Hence the formation of different stable 
conformations can be partly explained by hydration ef- 
fects due to the chemical structure and solubility of the 
Met-Enkephalin. The rapid increase in the free energy 
of 4 kcal/mol for fluctuations S{Af /Al) ^0.1 avoids the 
appearance of drastic structure transitions around this 
minimum. 

The rigidity and flexibility of the Met-Enkephalin and its 
residues is finally illustrated in Fig. [13] by a Ramachan- 
dran plot. Here the residues GLY-2, GLY-3 and PHE-4 
are investigated concerning their mechanical flexibility. 
All residues are able to form /3-sheets while especially 
the dihedral angles of PHE-4 prefer to visit the a-helical 
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COMPUTATIONAL COST 

We finally have investigated the computational cost of 
the histogram reweighted metadynamics scheme in com- 
parison to the ordinary metadynamics technique studied 
by the two dimensional test case of Eqn. ([16]). Thus 
we simulated the two dimensional test case by a conven- 
tional metadynamics scheme [27^ where every hundredth 
step a gaussian hill of height O.le with a width of 0.25cr 
has been set. The same values have also been used in 
the grid scheme. We compared the times needed for the 
construction of the potential energy landscape using the 
X and y direction as reaction coordinates and normalized 
them. 

To analyze the influence of the number of collective vari- 



FIG. 13: Free energy landscape in the Ramachandran plot 
for three residues. The lines correspond to energy differences 
of 1.0 kcal/mol. 



ables d we conducted simulations with 1,2,3,5 and 10 
collective variables. It is obvious that the grid meta- 
dynamics scheme scales as 0{dt) whereas conventional 
metadynamics obeys a 0{dt^) behaviour (Fig. [H]). Al- 
though the program is not computationally optimized 
due to output-input data flow, it becomes clear that the 
overall time needed for the conventional metadynamics 
algorithm is increasing with the second power of simula- 
tion steps, respectively present hills. Hence it is evident 
that the number of hills is the dominating factor. This 
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FIG. 14: Normalized computational cost for the construction 
of the potential energy landscape of Eqn. (|16|) in the grid 
scheme in comparison to conventional metadynamics for dif- 
ferent numbers of collective variables d— 1, 2, 3, 5 and 10. 
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FIG. 15: Ratio for the time needed to calculate the meta- 
dynamics forces versus all interactions for one timestep in 
dependence of an increasing number of hills for the alanine 
dipeptide. 

leads us to the conclusion that our method can be used 
for a larger number of dimensions as well with a better 
scaling behaviour than the conventional scheme. 
The ratio of the time used for the calculation of the meta- 
dynamics algorithm T^^^ compared to the time used for 
all interactions T^^ for an increasing number of appar- 
ent hills is shown in Fig. [151 All data points have 
been derived by the simulation of the alanine dipeptide. 
It is evident that after 250 hills the calculational cost 
increases drastically in the ordinary metadynamics algo- 
rithm whereas for the grid technique the required time 
remains constant. Thus the grid variant of the meta- 
dynamics algorithm accelerates the computation of free 
energy landscapes in contrast to ordinary metadynamics 



FIG. 16: Time needed for computation of the ordinary meta- 
dynamics forces T^^^ of Met-EnkephaUn and alanine dipep- 
tide in comparison to the time for unbiased simulations r^^ for 
a varying number of hills and number of relaxation steps 
Nr. All values follow a linear relation with slope c ^ 0.13 and 
the number of collective variables was d — 2. 



considerably. 

Nevertheless it can be assumed that the total time needed 
for the evaluation of the biasing forces in one timestep is 
negligible compared to the number of interactions espe- 
cially for systems with many degrees of freedom. Thus for 
a system with a large number of interactions, the cost for 
the evaluation of the metadynamics forces may decrease 
in comparison to the total time. 

To investigate this situation and to derive an empiri- 
cal formula, we studied the computational cost for the 
Met-Enkephalin and the alanine dipeptide. Hence we 
compared the total time which is required for the com- 
putation of the conventional metadynamics forces with- 
out a grid r^^^ to the total time of an unbiased run 
T^^. It can be assumed that the time for the calcula- 
tion of the metadynamics forces depends on the number 
of present hills n^, the number of collective variables d 
and the number of relaxation steps Nr after that a new 
hill is deposited. In addition, the contributing number 
of atoms in the collective variable may also influence the 
computation time. For reasons of simplicity we assume 
that this time is mainly determined by the evaluation of 
Q(x — .) (Eqn. ([2])) where all other effects are neg- 
ligible. Thus for much larger systems this factor may 
become size-dependent. 

In summary the total time needed for the evaluation of 
the ordinary metadynamics forces in a simulation can 
be written as r^^^^ ~ dNr^j^i j ~ dNrn^n^{n^ + 
l)^^MTi?/2 where dt^^j^ denotes the time needed for the 
evaluation of a single hill. Additionally it is further as- 
sumed that the simulation has finished after hills have 
been set which is given by the relation = NJN^. 
Furthermore the total time for an unbiased simulation r^^ 
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is nearly proportional to the number of atoms in the sys- 
tem N^^^^ [65,] and the number of simulation timesteps 
which yields r^^ ~ ^atom^s^'^ub with the average time 
St^^ needed for the calculation of a single unbiased force. 
This gives the ratio 



dNrn^{n^ + 1) ^^mtd 



dn„ St, 



2N N 



27V 



(20) 



where St^^^/5t^^ can be empirically determined. It has 
to be mentioned that the ratio is strongly dependent on 
the algorithms which are used for the evaluation of the 
forces but not on the processor speed. 
We have used varying relaxation timesteps of Nr = 
1 - 1000, different numbers of hills = 50 - 10000 
and the number of atoms was N^^^^ = 3231 for the Met- 
Enkephalin with water and N^^^^ =961 for the alanine 
dipeptide plus water whereas TVs varies between 100 and 
100000. 

The values for different ordinary metadynamics runs in 
comparison to unbiased runs for the Met-Enkephalin and 
the alanine dipeptide are shown in Fig. [TH It is evident 
that all values follow Eqn. ([2Q|) with a proportionality 
factor of c = 0.13 ± 0.01. Hence for a large number of 
present hills, the ordinary metadynamics method drasti- 
cally increases computation time. Thus as a general re- 
mark, the original method becomes computationally very 
expensive if the number of hills is larger than the number 
of atoms. This is often given for systems with implicit 
solvent models or complex and large energy landscapes. 
As we have shown before, the grid based technique obeys 
a linear 0{dt) behaviour, such that the ratio between the 
time needed for an ordinary metadynamics algorithm in 
comparison to the grid technique also grows in accor- 
dance to Eqn. (|2Q|) . Especially for ordinary metadynam- 
ics simulations where the free energy landscape is not 
refined by histogram reweighting procedures the amount 
of settled hills often increases the number of atoms which 
strongly suggests the use of grid based techniques. This is 
mostly important for classical Molecular Dynamics simu- 
lations for which our method is more intended in contrast 
to ab-intio methods. The evaluation of the electronic mo- 
tion is here the main factor [92*] dominating most of the 
computation time such that the evaluation of the meta- 
dynamics potential can be neglected. 



SUMMARY AND CONCLUSION 

We have presented an efficient and simple method for 
the calculation of free energy landscapes. The technique 
is applicable for a broad range of different systems. The 
basic principles are the construction of a potential land- 
scape on a predefined grid in an initial simulation run 
which is used as a biasing potential in the final simula- 
tions. The corresponding probability distributions of the 



biased simulations are reweighted by the WHAM proce- 
dure whose usage avoids specific drawbacks of the ordi- 
nary metadynamics algorithm like large error tolerances 

Another advantage is the easy implementation due to 
its simple methodology in software packages like GRO- 
MACS. In addition we have presented the application of 
a projection scheme which allows to transform the energy 
landscape of certain collective variables to further reac- 
tion coordinates without any additional simulation effort. 
The specific choice of biasing the first eigenvectors of the 
system achieves a good activation of the whole system. 
The history-dependent potential is short-ranged in con- 
trast to the ordinary Metadynamics scheme [26|, l27| and 
serves as a pure biasing potential. The fine resolution 
of the free energy landscape is achieved by histogram 
reweighting techniques of the corresponding probability 
distributions. A further advantage of the method is the 
opportunity to tune the resolution of the landscape even 
after the simulations were finished. 

Furthermore we have explicitly shown that a grid-based 
technique scales with 0{dt) in contrast to the conven- 
tional metadynamics scheme {0{dt'^)), where t denotes 
the simulation time and d the number of applied collec- 
tive variables. Hence, compared to the ordinary metady- 
namics algorithm a grid technique is more preferable due 
to a decreased simulation time. This has been shown by 
the computational cost for the alanine dipeptide and the 
Met-Enkephalin. Via a linear relation we were able to 
show that the computational cost of the ordinary meta- 
dynamics method in comparison to the grid technique 
can be predicted by an empirical formula. Thus the grid 
method is preferable for systems where the number of 
settled hills exceeds the number of all atoms in the sys- 
tem. As we have discussed, this fact is given for implicit 
solvation models as well as very large energy landscapes. 
In addition our method has been tested for the peptides 
alanine dipeptide and Met-Enkephalin. The results are 
in good agreement to the literature. We have further 
shown that the energy landscape of the Met-Enkephalin 
is funnel-like with many different conformations at sev- 
eral energetic minima. The similarity between these con- 
formations can be illustrated by a characteristic ratio of 
the hydrophilic solvent accessible surface area in compar- 
ison to the total solvent accessible area which results in a 
characteristic minimum around Af /Al ^ 0.49. The flex- 
ibility of the Met-Enkephalin is illustrated by plotting 
the dihedral angles of each residue in a Ramachandran 
plot. 

Additionally we have shown that the method allows to 
identify the stable conformations of the alanine dipep- 
tide with good accuracy of the free energy landscape. 
Following [27, 46,], where it was recognized that the func- 
tionality of a biasing potential does not depend on its 
specific shape, we have further validated our technique 
by the explicit calculation of two well-defined test cases 
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such that it can be used as an additional variation of the 
existing metadynamics methods. 
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